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ABSTRACT 


In  a  recent  paper,  Bickel  and  Doksum  (1981)  argue  that  the  performance  of 
a  method  for  estimating  transformations  due  to  Box  and  Cox  is  "unstable" 
because  estimates  of  the  transformation  parameter  on  the  one  hand,  and  of  the 
remaining  parameters  on  the  other,  can  be  highly  correlated.  In  this  note  it 
is  argued  that  while  criticisms  are  qualitatively  obvious,  they  are 
scientifically  irrelevant. 
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SIGNIFICANCE  AND  EXPLANATION 


*  In  a -wo 1 1  -known  paper  written  in  1964,  Box  and  Cox  described  a  method  for 
estimating  transformations  and  showed  how  in  suitable  cases  valuable  increases 
in  simplicity  and  efficiency  were  possible.  Since  that  time,  this  technique 
has  enjoyed  wide  practical  use  and  considerable  success.  However,  a  recent 
theoretical  paper  by  Bickel  and  Doksum  (1981)  seems  to  suggest  that  serious 
dangers  are  associated  with  the  employment  of  this  method,  and  speaks  of 
"instability"  and  "cost"  of  estimation  of  the  transformation.  These 
difficulties  seem  to  be  associated  with 

(1)  examples  which  common  sense  would  rule  out,  namely  situations  where 
the  effect  of  transformation  on  the  data  is  almost  linear,  so  that  it  is  a 
matter  of  indifference  which  transformation  is  used; 

(2)  the  idea  that  it  makes  sense  to  state  conclusions  in  terms  of  a 


number  measured  on  an  arbitrary  scale; 


(3)  failure  to  take  proper  account  of  the  Jacobian  of  the 


The  responsibility  for  the  wording  and  views  expressed  in  this  descriptive 
stannary  lies  with  MRC,  and  not  with  the  authors  of  this  report. 


AN  ANALYSIS  OP  TRANSFORMATIONS  REVISITED,  REBUTTED 

*  t 

6.  E.  P.  Box  and  D.  R.  Cox 

Transformation  has  long  been  a  powerful  tool  in  developing  parsimonious 

representations  and  interpretations  of  data.  In  1964  we  examined  the  formal 

estimation  of  a  suitable  transformation.  In  particular  suppose  that  a 

(A) 

response  y  is  transformed  to  y  ,  where 

y(X)  =  (y*  -  1)/X  (X  *  0) 

log  y  ( X  *  0  ) 

and  that  we  assume  provisionally  that  for  some  unknown  X,  the  vector 

y^  *  (y!^,  ••.,  y^S  of  n  transformed  observations  satisfies  a  linear 
-  I  n 

model 

E(y*** )  =  X0 

where  0  is  unknown,  the  errors  being  independently  normally  distributed  with 

2  2 

zero  mean  and  constant  variance  o  .  Estimation  of  X,  0,  and  a  can  be  by 
Bayesian  or  maximum  likelihood  methods. 

Bickel  and  Doksum  (1981)  in  a  recent  technically  very  impressive  paper 
have  in  particular  studied  the  joint  estimation  of  X  and  0,  examining 
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* 

consistency  and  asymptotic  variances.  They  report  that  the  cost  of  not 
knowing  X  and  having  to  estimate  it,  can  be  severe;  that  "...the  performance 
of  all  Box-Cox  type  procedures  is  unstable  and  highly  dependent  on  the 
parameters  of  the  model  in  structured  models  with  small  to  moderate  error 

A  A 

variances •"  That  is,  the  estimates  A  and  0  can  be  highly  correlated,  so 

A 

that  the  marginal  variances  of  the  9's  can  be  inflated  by  large  factors  over 
the  conditional  variances  for  fixed  X. 

It  seems  to  us  that  this  general  conclusion  is  qualitatively  obvious 
and  at  the  same  time  scientifically  irrelevant. 

To  illustrate  first  the  obviousness,  take  as  a  simple  example  the 
comparison  of  two  groups  of  modest  size,  the  observations  y  in  group  1  being 

near  995  and  those  in  group  2  being  near  1005,  the  scatters  within  the  two 

groups  being  roughly  normal  with  standard  deviations  close  to  unity.  A 
parameter  6  representing  the  difference  between  groups  on  the  y-scale  is 
quite  precisely  estimated  to  be  about  10  y-units.  Suppose  that  the 
possibility  of  transformation  were  contemplated.  For  a  very  wide  range 
of  X  the  function  y ^ ^  is  very  nearly  linear  in  y  over  the  span  of  the  data, 
and,  in  particular,  unless  the  sample  sizes  were  very  large  indeed,  it  would 
be  quite  impossible  to  distinguish  from  the  data  whether  y  or  y  '  gave  better 
fit  to  the  standard  normal  assumptions:  if  the  parameter  0  were  to  refer  to 
a  difference  on  the  y  *  scale  it  is  quite  precisely  estimated  to  be  near 
-10  ^  y  ^-units  (or  10  ®  y^  ^  -units,  where  y^  ^  *  (1/y  -  1)/(-1))«  Thus 

if  the  target  parameter  8  is  defined  in  terms  of  unknown  X  in  such  a  case  as 

this,  where  X  is  poorly  determined,  the  numerical  value  of 
0  (in  units  of  y^  or  y^S  could  be  virtually  anything. 

As  to  the  scientific  implications  of  this,  how  can  it  be  sensible 
scientifically  to  state  a  conclusion  as  a  number  measured  on  an  unknown 
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scale?  Surely  to  know  that  some  effect  has  magnitude  10  units  is  without 
content  unless  one  knows  the  scale  and  units  in  which  the  effect  is  defined. 

To  say  in  the  above  idealized  example  that  0,  defining  the  difference 
between  groups,  is  ill  determined,  because  the  data  establish  a  wide  range  of 
functions  as  virtually  equivalent,  seems  to  be  very  misleading. 

There  is,  of  course,  no  dispute  with  Bickel  and  Doksum  over 
mathematics:  the  issue  is  one  of  scientific  relevance.  As  with  any  procedure 
it  is  necessary  to  use  some  common  sense  in  estimating  transformations,  and  in 
particular  (see  for  example  Box  et  al  (1978),  p.  241)  not  to  expect  this  to  be 
possible  or  relevant  when  for  the  particular  data  and  the  particular  class  of 
transformations  in  mind  the  transformation  is  essentially  linear. 

Of  course  the  gross  correlation  effects  would  be  avoided  if,  following 
our  paper,  the  investigation  had  been  conducted  in  terms  of 

(X)  _  (yX  -  1)/(X}(X“1,|  (X  *  0) 

z  ** 

ylog  y  (X  *  0) 

which  takes  account  of  the  Jacobian  of  the  transformation.  (For  the  above 
examples  the  differences  in  means  for  both  z*  ^  and  z*  would  then  have 
been  very  nearly  ten  units.)  However  some  question  of  scientific  relevance 
would  still  remain. 

There  are  numerous  aspects  of  transformations  that  merit  further 
study.  These  include  in  particular  the  further  development  of  simple  ways  of 
assessing  transformation  potential.  That  is,  of  providing  some  more  formal 
measure  of  the  ability  of  particular  data  to  provide  useful  information  about 


a  class  of  transformations 
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